LCORE-1446: Add support for metadata filters in Solr vector search#1581
LCORE-1446: Add support for metadata filters in Solr vector search#1581anik120 wants to merge 1 commit intolightspeed-core:mainfrom
Conversation
WalkthroughAdds structured Solr metadata filter support and documentation, updates query-parameter normalization to extract nested Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/models/requests.py (1)
127-150:⚠️ Potential issue | 🟡 MinorUpdate the canonical
filtersdescription to match the new routing.
filtersis no longer always passed through asparams["solr"]; structured metadata filters are extracted to top-levelparams["filters"]insrc/utils/vector_search.py. Keeping this doc/field description stale conflicts with the new examples.Proposed doc update
class SolrVectorSearchRequest(BaseModel): """LCORE Solr inline RAG options for ``vector_io.query`` (mode and provider filters). Attributes: mode: Solr vector_io search mode. When omitted, the server default (hybrid) is used. - filters: Solr provider filter payload passed through as params['solr']. + filters: Solr provider filter payload. Structured metadata filters under + a nested ``filters`` key are sent to top-level ``params["filters"]``; + remaining legacy Solr options are passed under ``params["solr"]``. @@ filters: Optional[dict[str, Any]] = Field( None, - description="Solr provider filter payload passed through as params['solr'].", + description=( + "Solr provider filter payload. Structured metadata filters under " + "a nested 'filters' key are sent as top-level vector_io filters; " + "remaining legacy Solr options are passed under params['solr']." + ), examples=[{"fq": ["product:*openshift*", "product_version:*4.16*"]}], )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/models/requests.py` around lines 127 - 150, Update the Field description for the filters attribute in the LCORE Solr inline RAG options (the filters field defined in this model used by vector_io.query) to reflect the new routing: explain that structured metadata filters are extracted to top-level params["filters"] while remaining Solr-specific payloads are kept under params["solr"], and adjust the example text accordingly so it no longer states that filters are always passed through as params["solr"]; reference the extraction logic in src/utils/vector_search.py when rewording to make the distinction clear.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/models/requests.py`:
- Around line 290-312: Reformat the unformatted nested dict/list literals in the
Solr examples block (the entries containing "mode": "hybrid", "mode":
"semantic", and the {"filters": {"fq": ["product:*openshift*"]}} entry) to
comply with Black style; run Black (or your project's pre-commit formatting) on
src/models/requests.py so the nested dictionaries and lists are reformatted
consistently (wrap long lines, align brackets and commas) and commit the
reflowed code.
In `@src/utils/vector_search.py`:
- Line 90: Reformat the long dict comprehension into Black-friendly multi-line
form: break the expression that assigns remaining_filters from
solr.filters.items() across lines so Black won't reflow it; locate the
assignment to remaining_filters (the comprehension over solr.filters.items())
and split the dict literal, the for clause, and the if clause onto separate
lines consistent with Black's preferred formatting.
- Around line 61-68: Update the docstring for the function that builds the
parameter dictionary for vector_io.query (the code handling the solr argument)
to clarify that only structured filters (the structured Solr request: mode and
structured filters) are lifted to the top level, while legacy/unnested filter
payloads remain nested under params["solr"]; explicitly mention the solr
argument and that legacy filters are preserved under params["solr"] so callers
know which filters are moved vs retained.
In `@tests/unit/utils/test_vector_search.py`:
- Around line 83-136: Reformat the inline
SolrVectorSearchRequest.model_validate(...) test fixtures in
tests/unit/utils/test_vector_search.py to Black-compliant multi-line style with
trailing commas (for each dict passed to
SolrVectorSearchRequest.model_validate), then re-run Black/flake formatting;
focus on the fixtures in test_with_filters_and_other_solr_params,
test_with_compound_filter, and the earlier test block that calls
_build_query_params(solr=solr) so the dict literals are split across lines,
keys/values have trailing commas, and the file passes Black. Use the same symbol
references (SolrVectorSearchRequest.model_validate and _build_query_params) to
locate and update the offending calls.
---
Outside diff comments:
In `@src/models/requests.py`:
- Around line 127-150: Update the Field description for the filters attribute in
the LCORE Solr inline RAG options (the filters field defined in this model used
by vector_io.query) to reflect the new routing: explain that structured metadata
filters are extracted to top-level params["filters"] while remaining
Solr-specific payloads are kept under params["solr"], and adjust the example
text accordingly so it no longer states that filters are always passed through
as params["solr"]; reference the extraction logic in src/utils/vector_search.py
when rewording to make the distinction clear.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7227812e-62f3-4284-997b-ba1189f40da7
📒 Files selected for processing (3)
src/models/requests.pysrc/utils/vector_search.pytests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
- GitHub Check: radon
- GitHub Check: Pylinter
- GitHub Check: unit_tests (3.12)
- GitHub Check: unit_tests (3.13)
- GitHub Check: integration_tests (3.13)
- GitHub Check: integration_tests (3.12)
- GitHub Check: build-pr
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Import FastAPI dependencies with:from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with:from llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Usetyping_extensions.Selffor model validators in Pydantic models
Use modern union type syntaxstr | intinstead ofUnion[str, int]
UseOptional[Type]for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Useasync deffor I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with@abstractmethoddecorators
Use@model_validatorand@field_validatorfor Pydantic model validation
Complete type annotations for all class attributes; use specific types, notAny
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections
Files:
src/utils/vector_search.pysrc/models/requests.pytests/unit/utils/test_vector_search.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Pydantic models extend
ConfigurationBasefor config,BaseModelfor data models
Files:
src/utils/vector_search.pysrc/models/requests.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest; pytest is the standard for this project
Usepytest-mockfor AsyncMock objects in tests
Use markerpytest.mark.asynciofor async tests
Unit tests require 60% coverage, integration tests 10%
Files:
tests/unit/utils/test_vector_search.py
🧠 Learnings (4)
📚 Learning: 2026-03-17T11:34:53.242Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.
Applied to files:
src/utils/vector_search.pysrc/models/requests.pytests/unit/utils/test_vector_search.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/requests.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/requests.py
📚 Learning: 2026-04-15T18:54:07.540Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:769-773
Timestamp: 2026-04-15T18:54:07.540Z
Learning: In lightspeed-core/lightspeed-stack request models, keep schema-level size limits defined as inline numeric literals (e.g., Pydantic Field max_length values used for validation) rather than extracting them into constants.py. Only extract values intended as configurable runtime defaults (e.g., DEFAULT_* settings like header/file upload sizes). Do not flag inline numeric literals used directly in Pydantic Field constraints or field validators as an extraction-to-constants issue.
Applied to files:
src/models/requests.py
🪛 GitHub Actions: Black
src/utils/vector_search.py
[error] 1-1: Black --check would reformat this file.
src/models/requests.py
[error] 1-1: Black --check would reformat this file.
tests/unit/utils/test_vector_search.py
[error] 1-1: Black --check would reformat this file.
🔇 Additional comments (1)
src/models/requests.py (1)
777-779: ResponsesRequest doc update looks aligned.The docstring now advertises structured and legacy Solr filter support for the Responses API request model.
| logger.debug("Extracted filters from solr.filters: %s", params["filters"]) | ||
|
|
||
| # Pass remaining solr.filters content (legacy fq, etc.) to params["solr"] | ||
| remaining_filters = {k: v for k, v in solr.filters.items() if k != "filters"} |
There was a problem hiding this comment.
Format the comprehension with Black.
CI is already failing Black; this long single-line dict comprehension is one of the new spots that will be rewritten.
Black-style formatting
- remaining_filters = {k: v for k, v in solr.filters.items() if k != "filters"}
+ remaining_filters = {
+ k: v for k, v in solr.filters.items() if k != "filters"
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/utils/vector_search.py` at line 90, Reformat the long dict comprehension
into Black-friendly multi-line form: break the expression that assigns
remaining_filters from solr.filters.items() across lines so Black won't reflow
it; locate the assignment to remaining_filters (the comprehension over
solr.filters.items()) and split the dict literal, the for clause, and the if
clause onto separate lines consistent with Black's preferred formatting.
Updates the Solr vector search integration to support the new Filter API
introduced in llama-stack 0.6.0, enabling metadata-based filtering of RAG
results using comparison and compound filters.
Filter format examples:
Simple: {"filters": {"type": "eq", "key": "platform", "value": "openshift"}}
Compound: {"filters": {"type": "and", "filters": [...]}}
Note: This change requires lightspeed-providers with solr_vector_io filter support,
introduced in lightspeed-core/lightspeed-providers#119
b6f480c to
3b5b567
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
src/models/requests.py (1)
290-312:⚠️ Potential issue | 🟡 MinorRun Black on the Solr examples.
CI still reports
src/models/requests.pywould be reformatted; this example block has missing trailing commas and a long inline dict/list literal.Black-style formatting
{ "mode": "hybrid", "filters": { "filters": { "type": "eq", "key": "platform", - "value": "openshift" + "value": "openshift", } - } + }, }, { "mode": "semantic", "filters": { "filters": { "type": "and", "filters": [ {"type": "eq", "key": "platform", "value": "openshift"}, - {"type": "in", "key": "version", "value": ["4.14", "4.15", "4.16"]} - ] + { + "type": "in", + "key": "version", + "value": ["4.14", "4.15", "4.16"], + }, + ], } - } + }, },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/models/requests.py` around lines 290 - 312, The example Solr query literals in src/models/requests.py (the inline dict/list block containing the hybrid/semantic/filter entries) are not Black-formatted — add the missing trailing commas and reformat the long inline dict/list across multiple lines to satisfy Black; specifically update the three example entries (the dict with "mode": "hybrid", the dict with "mode": "semantic" and nested "filters" list, and the final {"filters": {"fq": ["product:*openshift*"]}} entry) so each nested dict/list element has proper trailing commas and line breaks, then run Black (or apply Black formatting) to the file.src/utils/vector_search.py (2)
61-68:⚠️ Potential issue | 🟡 MinorClarify that only structured filters are extracted.
Legacy Solr filters remain under
params["solr"], so the return docs should not imply every filter moves to top-levelparams["filters"].Docstring update
Args: solr: Optional structured Solr request (mode and filters from the API). - mode: Solr search mode (semantic, hybrid, lexical) - - filters: Solr filter payload, may contain structured metadata filters + - filters: Solr filter payload; nested structured metadata filters are + extracted, while legacy Solr filters remain under ``params["solr"]`` Returns: - Parameter dictionary for ``vector_io.query`` with extracted filters at top level. + Parameter dictionary for ``vector_io.query``. Structured metadata filters are + placed at top level; legacy Solr filters remain under ``params["solr"]``.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` around lines 61 - 68, Update the docstring for the function that builds parameters for vector_io.query to make clear that only structured Solr filters (the `solr["filters"]` structured payload) are extracted into top-level `params["filters"]`, while legacy/other Solr filters remain nested under `params["solr"]`; explicitly mention the `solr` argument, `params["filters"]`, and `params["solr"]` to avoid implying all filters are lifted to top level.
90-90:⚠️ Potential issue | 🟡 MinorFormat the dict comprehension with Black.
CI reports this file would be reformatted; split the long comprehension before merging.
Black-style formatting
- remaining_filters = {k: v for k, v in solr.filters.items() if k != "filters"} + remaining_filters = { + k: v for k, v in solr.filters.items() if k != "filters" + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/vector_search.py` at line 90, The dict comprehension assigning remaining_filters is too long for Black; reformat it into a multi-line comprehension (wrap the comprehension in parentheses and place the for/if parts on separate indented lines) so Black won't reformat it in CI—e.g., break the expression that builds remaining_filters from solr.filters across multiple lines and align the key/value pair, for-loop, and conditional onto their own lines while preserving the same logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/utils/test_vector_search.py`:
- Around line 109-151: Update the two tests
test_with_filters_and_other_solr_params and test_with_compound_filter to assert
the full structured filter payload returned by _build_query_params (not just
type/key/length); for the first test, verify params["filters"] equals the full
dict {"type":"in","key":"version","value":["4.14","4.15"]} and that
params["solr"] equals {"custom_param":"value"} and k is default, and for the
second test assert params["filters"] equals the full nested compound dict (type
"and" with its "filters" list containing the exact eq and ne dicts) and that
"solr" is absent; use the SolrVectorSearchRequest-created expected structures to
compare deep equality.
---
Duplicate comments:
In `@src/models/requests.py`:
- Around line 290-312: The example Solr query literals in src/models/requests.py
(the inline dict/list block containing the hybrid/semantic/filter entries) are
not Black-formatted — add the missing trailing commas and reformat the long
inline dict/list across multiple lines to satisfy Black; specifically update the
three example entries (the dict with "mode": "hybrid", the dict with "mode":
"semantic" and nested "filters" list, and the final {"filters": {"fq":
["product:*openshift*"]}} entry) so each nested dict/list element has proper
trailing commas and line breaks, then run Black (or apply Black formatting) to
the file.
In `@src/utils/vector_search.py`:
- Around line 61-68: Update the docstring for the function that builds
parameters for vector_io.query to make clear that only structured Solr filters
(the `solr["filters"]` structured payload) are extracted into top-level
`params["filters"]`, while legacy/other Solr filters remain nested under
`params["solr"]`; explicitly mention the `solr` argument, `params["filters"]`,
and `params["solr"]` to avoid implying all filters are lifted to top level.
- Line 90: The dict comprehension assigning remaining_filters is too long for
Black; reformat it into a multi-line comprehension (wrap the comprehension in
parentheses and place the for/if parts on separate indented lines) so Black
won't reformat it in CI—e.g., break the expression that builds remaining_filters
from solr.filters across multiple lines and align the key/value pair, for-loop,
and conditional onto their own lines while preserving the same logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: f394fddb-4d56-45f0-a386-7dc57aa59c34
📒 Files selected for processing (3)
src/models/requests.pysrc/utils/vector_search.pytests/unit/utils/test_vector_search.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: radon
- GitHub Check: mypy
- GitHub Check: check_dependencies
- GitHub Check: build-pr
- GitHub Check: Pylinter
- GitHub Check: integration_tests (3.12)
- GitHub Check: unit_tests (3.13)
- GitHub Check: unit_tests (3.12)
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
**/*.py: Use absolute imports for internal modules:from authentication import get_auth_dependency
Import FastAPI dependencies with:from fastapi import APIRouter, HTTPException, Request, status, Depends
Import Llama Stack client with:from llama_stack_client import AsyncLlamaStackClient
Checkconstants.pyfor shared constants before defining new ones
All modules start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Type aliases defined at module level for clarity
Use Final[type] as type hint for all constants
All functions require docstrings with brief descriptions
Complete type annotations for parameters and return types in functions
Usetyping_extensions.Selffor model validators in Pydantic models
Use modern union type syntaxstr | intinstead ofUnion[str, int]
UseOptional[Type]for optional type hints
Use snake_case with descriptive, action-oriented function names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Useasync deffor I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling
Use standard log levels with clear purposes: debug, info, warning, error
All classes require descriptive docstrings explaining purpose
Use PascalCase for class names with standard suffixes: Configuration, Error/Exception, Resolver, Interface
Use ABC for abstract base classes with@abstractmethoddecorators
Use@model_validatorand@field_validatorfor Pydantic model validation
Complete type annotations for all class attributes; use specific types, notAny
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections
Files:
src/utils/vector_search.pytests/unit/utils/test_vector_search.pysrc/models/requests.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Pydantic models extend
ConfigurationBasefor config,BaseModelfor data models
Files:
src/utils/vector_search.pysrc/models/requests.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests
Do not use unittest; pytest is the standard for this project
Usepytest-mockfor AsyncMock objects in tests
Use markerpytest.mark.asynciofor async tests
Unit tests require 60% coverage, integration tests 10%
Files:
tests/unit/utils/test_vector_search.py
🧠 Learnings (5)
📚 Learning: 2026-03-17T11:34:53.242Z
Learnt from: are-ces
Repo: lightspeed-core/lightspeed-stack PR: 1335
File: docs/config.md:411-411
Timestamp: 2026-03-17T11:34:53.242Z
Learning: In the lightspeed-stack project (`src/models/config.py`, `docs/config.md`), the internal Solr filter `is_chunk:true` (defined as `SOLR_CHUNK_FILTER_QUERY` in `src/constants.py`) is always injected by the system for OKP searches and is intentionally hidden from users. The `chunk_filter_query` field in `OkpConfiguration` is user-facing and additive-only, but the documentation must NOT mention the internal `is_chunk:true` behavior — this is a deliberate design decision by the maintainers.
Applied to files:
src/utils/vector_search.pytests/unit/utils/test_vector_search.pysrc/models/requests.py
📚 Learning: 2026-02-23T14:11:46.950Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-188
Timestamp: 2026-02-23T14:11:46.950Z
Learning: The query request validator in the Responses API flow requires that `query_request.model` and `query_request.provider` must either both be specified or both be absent. The concatenated format (e.g., `model="provider/model"` with `provider=None`) is not permitted by the validator.
Applied to files:
src/models/requests.py
📚 Learning: 2026-01-12T10:58:40.230Z
Learnt from: blublinsky
Repo: lightspeed-core/lightspeed-stack PR: 972
File: src/models/config.py:459-513
Timestamp: 2026-01-12T10:58:40.230Z
Learning: In lightspeed-core/lightspeed-stack, for Python files under src/models, when a user claims a fix is done but the issue persists, verify the current code state before accepting the fix. Steps: review the diff, fetch the latest changes, run relevant tests, reproduce the issue, search the codebase for lingering references to the original problem, confirm the fix is applied and not undone by subsequent commits, and validate with local checks to ensure the issue is resolved.
Applied to files:
src/models/requests.py
📚 Learning: 2026-02-25T07:46:33.545Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:33.545Z
Learning: In the Python codebase, requests.py should use OpenAIResponseInputTool as Tool while responses.py uses OpenAIResponseTool as Tool. This difference is intentional due to differing schemas for input vs output tools in llama-stack-api. Apply this distinction consistently to other models under src/models (e.g., ensure request-related tools use the InputTool variant and response-related tools use the ResponseTool variant). If adding new tools, choose the corresponding InputTool or Tool class based on whether the tool represents input or output, and document the rationale in code comments.
Applied to files:
src/models/requests.py
📚 Learning: 2026-04-15T18:54:07.540Z
Learnt from: Lifto
Repo: lightspeed-core/lightspeed-stack PR: 1510
File: src/models/requests.py:769-773
Timestamp: 2026-04-15T18:54:07.540Z
Learning: In lightspeed-core/lightspeed-stack request models, keep schema-level size limits defined as inline numeric literals (e.g., Pydantic Field max_length values used for validation) rather than extracting them into constants.py. Only extract values intended as configurable runtime defaults (e.g., DEFAULT_* settings like header/file upload sizes). Do not flag inline numeric literals used directly in Pydantic Field constraints or field validators as an extraction-to-constants issue.
Applied to files:
src/models/requests.py
🪛 GitHub Actions: Black
src/utils/vector_search.py
[error] 1-1: Black --check failed: file would be reformatted by Black. Run 'uv tool run black --write src/utils/vector_search.py' (or 'uv tool run black --write .').
src/models/requests.py
[error] 1-1: Black --check failed: file would be reformatted by Black. Run 'uv tool run black --write src/models/requests.py' (or 'uv tool run black --write .').
🔇 Additional comments (4)
src/models/requests.py (2)
285-287: Solr filter description looks aligned.This now documents structured comparison filters while preserving legacy filter-only compatibility.
777-779: Responses API Solr docs look good.The request docstring now reflects structured metadata filters and legacy compatibility.
tests/unit/utils/test_vector_search.py (2)
70-107: Good coverage for legacy and simple structured filters.These cases verify the key behavioral split: legacy
fqremains underparams["solr"], while structured metadata filters move to top-levelparams["filters"].
163-180: Mode plus legacy filter tests look good.These keep coverage for request mode overrides and default mode behavior while preserving legacy Solr filter passthrough.
| def test_with_filters_and_other_solr_params(self) -> None: | ||
| """Test parameters with both filters and other solr-specific params.""" | ||
| solr = SolrVectorSearchRequest.model_validate( | ||
| { | ||
| "filters": { | ||
| "filters": { | ||
| "type": "in", | ||
| "key": "version", | ||
| "value": ["4.14", "4.15"], | ||
| }, | ||
| "custom_param": "value", | ||
| }, | ||
| }, | ||
| ) | ||
| params = _build_query_params(solr=solr) | ||
|
|
||
| assert params["solr"] == {"filter": "value"} | ||
| # Filters extracted to top-level | ||
| assert params["filters"]["type"] == "in" | ||
| assert params["filters"]["key"] == "version" | ||
| # Other params remain under solr key | ||
| assert params["solr"] == {"custom_param": "value"} | ||
| assert params["k"] == constants.SOLR_VECTOR_SEARCH_DEFAULT_K | ||
|
|
||
| def test_with_compound_filter(self) -> None: | ||
| """Test parameters with compound AND filter.""" | ||
| solr = SolrVectorSearchRequest.model_validate( | ||
| { | ||
| "filters": { | ||
| "filters": { | ||
| "type": "and", | ||
| "filters": [ | ||
| {"type": "eq", "key": "platform", "value": "openshift"}, | ||
| {"type": "ne", "key": "status", "value": "archived"}, | ||
| ], | ||
| }, | ||
| }, | ||
| }, | ||
| ) | ||
| params = _build_query_params(solr=solr) | ||
|
|
||
| assert params["filters"]["type"] == "and" | ||
| assert len(params["filters"]["filters"]) == 2 | ||
| assert "solr" not in params |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Assert the full structured filter payload.
The mixed and compound tests should verify values/nested filters are preserved, not only type, key, and list length.
Strengthen assertions
# Filters extracted to top-level
assert params["filters"]["type"] == "in"
assert params["filters"]["key"] == "version"
+ assert params["filters"]["value"] == ["4.14", "4.15"]
# Other params remain under solr key
assert params["solr"] == {"custom_param": "value"}
@@
assert params["filters"]["type"] == "and"
- assert len(params["filters"]["filters"]) == 2
+ assert params["filters"]["filters"] == [
+ {"type": "eq", "key": "platform", "value": "openshift"},
+ {"type": "ne", "key": "status", "value": "archived"},
+ ]
assert "solr" not in params📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_with_filters_and_other_solr_params(self) -> None: | |
| """Test parameters with both filters and other solr-specific params.""" | |
| solr = SolrVectorSearchRequest.model_validate( | |
| { | |
| "filters": { | |
| "filters": { | |
| "type": "in", | |
| "key": "version", | |
| "value": ["4.14", "4.15"], | |
| }, | |
| "custom_param": "value", | |
| }, | |
| }, | |
| ) | |
| params = _build_query_params(solr=solr) | |
| assert params["solr"] == {"filter": "value"} | |
| # Filters extracted to top-level | |
| assert params["filters"]["type"] == "in" | |
| assert params["filters"]["key"] == "version" | |
| # Other params remain under solr key | |
| assert params["solr"] == {"custom_param": "value"} | |
| assert params["k"] == constants.SOLR_VECTOR_SEARCH_DEFAULT_K | |
| def test_with_compound_filter(self) -> None: | |
| """Test parameters with compound AND filter.""" | |
| solr = SolrVectorSearchRequest.model_validate( | |
| { | |
| "filters": { | |
| "filters": { | |
| "type": "and", | |
| "filters": [ | |
| {"type": "eq", "key": "platform", "value": "openshift"}, | |
| {"type": "ne", "key": "status", "value": "archived"}, | |
| ], | |
| }, | |
| }, | |
| }, | |
| ) | |
| params = _build_query_params(solr=solr) | |
| assert params["filters"]["type"] == "and" | |
| assert len(params["filters"]["filters"]) == 2 | |
| assert "solr" not in params | |
| def test_with_filters_and_other_solr_params(self) -> None: | |
| """Test parameters with both filters and other solr-specific params.""" | |
| solr = SolrVectorSearchRequest.model_validate( | |
| { | |
| "filters": { | |
| "filters": { | |
| "type": "in", | |
| "key": "version", | |
| "value": ["4.14", "4.15"], | |
| }, | |
| "custom_param": "value", | |
| }, | |
| }, | |
| ) | |
| params = _build_query_params(solr=solr) | |
| # Filters extracted to top-level | |
| assert params["filters"]["type"] == "in" | |
| assert params["filters"]["key"] == "version" | |
| assert params["filters"]["value"] == ["4.14", "4.15"] | |
| # Other params remain under solr key | |
| assert params["solr"] == {"custom_param": "value"} | |
| assert params["k"] == constants.SOLR_VECTOR_SEARCH_DEFAULT_K | |
| def test_with_compound_filter(self) -> None: | |
| """Test parameters with compound AND filter.""" | |
| solr = SolrVectorSearchRequest.model_validate( | |
| { | |
| "filters": { | |
| "filters": { | |
| "type": "and", | |
| "filters": [ | |
| {"type": "eq", "key": "platform", "value": "openshift"}, | |
| {"type": "ne", "key": "status", "value": "archived"}, | |
| ], | |
| }, | |
| }, | |
| }, | |
| ) | |
| params = _build_query_params(solr=solr) | |
| assert params["filters"]["type"] == "and" | |
| assert params["filters"]["filters"] == [ | |
| {"type": "eq", "key": "platform", "value": "openshift"}, | |
| {"type": "ne", "key": "status", "value": "archived"}, | |
| ] | |
| assert "solr" not in params |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/utils/test_vector_search.py` around lines 109 - 151, Update the
two tests test_with_filters_and_other_solr_params and test_with_compound_filter
to assert the full structured filter payload returned by _build_query_params
(not just type/key/length); for the first test, verify params["filters"] equals
the full dict {"type":"in","key":"version","value":["4.14","4.15"]} and that
params["solr"] equals {"custom_param":"value"} and k is default, and for the
second test assert params["filters"] equals the full nested compound dict (type
"and" with its "filters" list containing the exact eq and ne dicts) and that
"solr" is absent; use the SolrVectorSearchRequest-created expected structures to
compare deep equality.
Description
Updates the Solr vector search integration to support the new Filter API introduced in llama-stack 0.6.0, enabling metadata-based filtering of RAG results using comparison and compound filters.
Filter format examples:
Simple: {"filters": {"type": "eq", "key": "platform", "value": "openshift"}}
Compound: {"filters": {"type": "and", "filters": [...]}}
Note: This change requires lightspeed-providers with solr_vector_io filter support, introduced in lightspeed-core/lightspeed-providers#119
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit